30 research outputs found
The Exploration-Exploitation Trade-Off in Sequential Decision Making Problems
Sequential decision making problems require an agent to repeatedly choose between
a series of actions. Common to such problems is the exploration-exploitation
trade-off, where an agent must choose between the action expected to yield the best
reward (exploitation) or trying an alternative action for potential future benefit (exploration).
The main focus of this thesis is to understand in more detail the role this
trade-off plays in various important sequential decision making problems, in terms
of maximising finite-time reward.
The most common and best studied abstraction of the exploration-exploitation
trade-off is the classic multi-armed bandit problem. In this thesis we study several
important extensions that are more suitable than the classic problem to real-world
applications. These extensions include scenarios where the rewards for actions
change over time or the presence of other agents must be repeatedly considered. In
these contexts, the exploration-exploitation trade-off has a more complicated role
in terms of maximising finite-time performance. For example, the amount of exploration
required will constantly change in a dynamic decision problem, in multiagent
problems agents can explore by communication, and in repeated games, the
exploration-exploitation trade-off must be jointly considered with game theoretic
reasoning.
Existing techniques for balancing exploration-exploitation are focused on achieving
desirable asymptotic behaviour and are in general only applicable to basic decision
problems. The most flexible state-of-the-art approaches, Î-greedy and Î-first,
require exploration parameters to be set a priori, the optimal values of which are
highly dependent on the problem faced. To overcome this, we construct a novel algorithm, Î-ADAPT, which has no exploration parameters and can adapt exploration
on-line for a wide range of problems. Î-ADAPT is built on newly proven theoretical
properties of the Î-first policy and we demonstrate that Î-ADAPT can accurately
learn not only how much to explore, but also when and which actions to explore
Frequency-Domain Stochastic Modeling of Stationary Bivariate or Complex-Valued Signals
There are three equivalent ways of representing two jointly observed
real-valued signals: as a bivariate vector signal, as a single complex-valued
signal, or as two analytic signals known as the rotary components. Each
representation has unique advantages depending on the system of interest and
the application goals. In this paper we provide a joint framework for all three
representations in the context of frequency-domain stochastic modeling. This
framework allows us to extend many established statistical procedures for
bivariate vector time series to complex-valued and rotary representations.
These include procedures for parametrically modeling signal coherence,
estimating model parameters using the Whittle likelihood, performing
semi-parametric modeling, and choosing between classes of nested models using
model choice. We also provide a new method of testing for impropriety in
complex-valued signals, which tests for noncircular or anisotropic second-order
statistical structure when the signal is represented in the complex plane.
Finally, we demonstrate the usefulness of our methodology in capturing the
anisotropic structure of signals observed from fluid dynamic simulations of
turbulence.Comment: To appear in IEEE Transactions on Signal Processin
A Power Variance Test for Nonstationarity in Complex-Valued Signals
We propose a novel algorithm for testing the hypothesis of nonstationarity in
complex-valued signals. The implementation uses both the bootstrap and the Fast
Fourier Transform such that the algorithm can be efficiently implemented in
O(NlogN) time, where N is the length of the observed signal. The test procedure
examines the second-order structure and contrasts the observed power variance -
i.e. the variability of the instantaneous variance over time - with the
expected characteristics of stationary signals generated via the bootstrap
method. Our algorithmic procedure is capable of learning different types of
nonstationarity, such as jumps or strong sinusoidal components. We illustrate
the utility of our test and algorithm through application to turbulent flow
data from fluid dynamics
Separating Mesoscale and Submesoscale Flows from Clustered Drifter Trajectories
Drifters deployed in close proximity collectively provide a unique observational data set with which to separate mesoscale and submesoscale flows. In this paper we provide a principled approach for doing so by fitting observed velocities to a local Taylor expansion of the velocity flow field. We demonstrate how to estimate mesoscale and submesoscale quantities that evolve slowly over time, as well as their associated statistical uncertainty. We show that in practice the mesoscale component of our model can explain much first and second-moment variability in drifter velocities, especially at low frequencies. This results in much lower and more meaningful measures of submesoscale diffusivity, which would otherwise be contaminated by unresolved mesoscale flow. We quantify these effects theoretically via computing Lagrangian frequency spectra, and demonstrate the usefulness of our methodology through simulations as well as with real observations from the LatMix deployment of drifters. The outcome of this method is a full Lagrangian decomposition of each drifter trajectory into three components that represent the background, mesoscale, and submesoscale flow
Detecting outlying demand in multi-leg bookings for transportation networks
Network effects complicate demand forecasting in general, and outlier detection in particular. For example, in transportation networks, sudden increases in demand for a specific destination will not only affect the legs arriving at that destination, but also connected legs nearby in the network. Network effects are particularly relevant when transport service providers, such as railway or coach companies, offer many multi-leg itineraries. In this paper, we present a novel method for generating automated outlier alerts, to support analysts in adjusting demand forecasts accordingly for reliable planning. To create such alerts, we propose a two-step method for detecting outlying demand from transportation network bookings. The first step clusters network legs to appropriately partition and pool booking patterns. The second step identifies outliers within each cluster to create a ranked alert list of affected legs. We show that this method outperforms analyses that independently consider each leg in a network, especially in highly-connected networks where most passengers book multi-leg itineraries. We illustrate the applicability on empirical data obtained from Deutsche Bahn and with a detailed simulation study. The latter demonstrates the robustness of the approach and quantifies the potential revenue benefits of adjusting for outlying demand in networks
A multivariate pseudo-likelihood approach to estimating directional ocean wave models
Ocean buoy data in the form of high frequency multivariate time series are routinely recorded at many locations in the world's oceans. Such data can be used to characterise the ocean wavefield, which is important for numerous socio-economic and scientific reasons. This characterisation is typically achieved by modelling the frequency-direction spectrum, which decomposes spatiotemporal variability by both frequency and direction. State-of-the-art methods for estimating the parameters of such models do not make use of the full spatiotemporal content of the buoy observations due to unnecessary assumptions and smoothing steps. We explain how the multivariate debiased Whittle likelihood can be used to jointly estimate all parameters of such frequency-direction spectra directly from the recorded time series. When applied to North Sea buoy data, debiased Whittle likelihood inference reveals smooth evolution of spectral parameters over time. We discuss challenging practical issues including model misspecification, and provide guidelines for future application of the method
The debiased Whittle likelihood
The Whittle likelihood is a widely used and computationally efficient pseudolikelihood. However, it is known to produce biased parameter estimates with finite sample sizes for large classes of models. We propose a method for debiasing Whittle estimates for second-order stationary stochastic processes. The debiased Whittle likelihood can be computed in the same O(n log n) operations as the standard Whittle approach. We demonstrate the superior performance of our method in simulation studies and in application to a large-scale oceanographic dataset, where in both cases the debiased approach reduces bias by up to two orders of magnitude, achieving estimates that are close to those of the exact maximum likelihood, at a fraction of the computational cost. We prove that the method yields estimates that are consistent at an optimal convergence rate of n(-1/2) for Gaussian processes and for certain classes of non-Gaussian or nonlinear processes. This is established under weaker assumptions than in the standard theory, and in particular the power spectral density is not required to be continuous in frequency. We describe how the method can be readily combined with standard methods of bias reduction, such as tapering and differencing, to further reduce bias in parameter estimates